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Abstract 

This paper offers a new point of view on component separation, based on a model of additive com- 
ponents which enjoys a much greater flexibihty than more traditional linear component models. This 
flexibility is needed to process the complex full-sky observations expected from the Planck space mission, 
for which it was developed, but it may also be useful in any context where accurate component separation 
is needed. 

1 Introduction 

This paper describes an advanced component separation technique and its application to the analysis of 
multi-spectral observations of the Cosmic Microwave Background (CMB). It is developed more specifically 
in view of the data processing pipeline for the forthcoming Planck mission [1 of the European Spatial Agency 
(ESA). 

Component separation is a critical part of CMB data processing because no frequency channel exists 
that would be sensitive only to CMB emission. As an illustration, figure [l] shows (simulated) maps of the 
microwave emission of the sky observed in the frequency channels of Planck i. e. in spectral bands centered 
around frequencies u = 30,44,70,100,143,217,353,545,857 GHz. The red (in an arbitrary color map) 
horizontal strips along the Galactic equator, in the middle of the maps, are due to Galactic emissions (that 
is, emissions from our Galaxy). In contrast, towards the poles where the Galactic emissions are weaker, 
one can see in the center channels (100, 143 and 217 GHz), an homogeneous texture which is the signal of 
interest: the (spatial) fiuctuations (or 'anisotropics') of the CMB. Unfortunately, even in the polar areas, the 
contamination of the CMB by various astrophysical emissions (or 'foregrounds', as opposed to the cosmic 
background) is still significant. There, especially at the smallest angular scales, contamination is dominated 
by the emission of remote extragalactic objects - galaxies (or 'point sources') of various types (which emit 
radiation through similar physical processes as our own Galaxy), and galaxy clusters (which emit via the 
Sunyaev-Zel'dovich effect, hereafter: the 'SZ component'). 

There is no significant occlusion between the various physical components so that the signal Xj,(^) 
measured at a frequency u in direction ^ is a linear superposition: 

= + + (c) + • • • + 

where XJ(0 is the contribution due to a particular component x and Niy{(_) accounts for instrumental noise 
at the map level. 

It must be stressed that, even though the lowest (30, 44, 70 GHz) and highest (353, 545, 857 GHz) 
frequency channels are more contaminated by Galactic emissions, they still are very informative because 
they can be used to predict and remove the Galactic emission in the center channels, leaving only CMB and 
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noise. . . in an ideal data processing scenario. Successful scientific exploitation of the CMB measurements 
for cosmological studies critically depends on accurate cleaning of the CMB and on quantifying as well as 
possible any remaining contamination. Figure [S] illustrates on simulated data what can be achieved with our 
technique on a very large fraction of the sky. 

Even though CMB recovery is the main goal, recovering the other components also is of scientific interest. 
For instance, Planck should also yield a very rich catalog of galaxy clusters, and provide a lot of information 
about the emission of the interstellar medium in our own Galaxy. Hence, we are actually facing a problem 
of component separation from multiple observations. 

The component separation problem is not new to the astrophysics community. In the past ten years 
however, component separation for CMB observations specifically has motivated the development of a large 
variety of methods with various degrees of sophistication (see [7^ for a review). The most direct methods 
rely on modelling and subtracting foreground emission. Such methods, typically, involve a thorough un- 
derstanding of the emission processes, necessary to build the foreground model. Other methods use simple 
decorrelation, or make use of differences between properly weighted maps [11 . The processing of data from 
the WMAP satellite mission makes uses of both kinds of methods [2 . 

The large majority of existing component separation methods in the context of CMB observations rely on 
a 'mixing model' as follows: for any available direction ^ in the sky, the m channels at frequencies . . . 
provide m data points which can be collected in an m x 1 vector X(^) which is modelled as 

X(0 = ASiO + N{0 (1) 

where each entry of S{^) contains the sky-pattern for the emission of a given component. The emissivity of a 
given component depends on the frequency hence, each column of matrix A refiects the emission law of the 
corresponding component. This equation, then, forms the basis for many component separation algorithms 
but it forces a special structure on each component, namely that it 'scales rigidly with frequency'. Some 
methods are developed to solve this component separation problem when the mixing matrix A is known a 
priori. This is the case, for instance, of the work described in [2Q | [T2 | [4| [T9]. 

Component separation is a well studied problem in Signal Processing, but the nature of the available 
data and the scientific objectives call for specific techniques. In particular, one may think of resorting to 
Independent Component Analysis (ICA) methods since these 'blind' techniques may be useful to deal with 
some of the uncertainties in the data structure. In addition, columns of the mixing matrix A corresponding 
to some of the astrophysical components (e.g. components modelling the emission of our own Galaxy) are 
known a priori up to significant uncertainty, and more precise knowledge is an objective of the forementioned 
astrophysical observations. Implementations of ICA ideas in the context of CMB observations can be found 
in [13 , 18 , 6 , ^[3]. 

Compared with all the above methods, our framework allows the inclusion of components with arbitrary 
structure (at least in terms of second order correlations). This structure, as will be detailed later on, can be 
matched to prior knowledge of the nature of the astrophysical emission. Along the same spirit of matching 
a parametric model of the emissions to the data, although quite different in implementation, are the work 
described in [2TJ [9] . 

Although there is substantial motivation for letting the data alone indicate the appropriate model of the 
emissions, a fully blind approach would leave out a lot of valuable prior information in the CMB context. 
The approach described in this paper is rooted in ICA but allows for the inclusion an arbitrary amount of 
prior information, in particular by allowing components with an arbitrary structure. 

Again, the approach to component separation described herein is not specific to this particular problem 
and may be considered for application to any situation where 'expensive' data deserve special care and have 
to be fitted by a complex component model. 

The paper is organized as follows: section [2] describes our approach, section [3] focuses in specific as- 
pects; implementation is discussed in section |4] in terms of a library of components; section [5] illustrates an 
application of the method on data from the Planck sky model. 
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2 A flexible approach to component separation 



This section introduces a key ingredient of our approach: the development of a flexible model of additive 
components in which the notion of 'mixing matrix', which is central to standard ICA, is disposed of. Not 
only the mixing matrix is not needed but it is 'considered harmful'. 

2.1 Unconstrained components 

We start the exposition by considering the most general component model for an m x 1 random vector X: 
we only postulate that X is made of the linear superposition of C components: 

c 

X = Y.X- (2) 

c=l 

where each component X^ is an m x 1 random vector. For simplicity, we shall assume that all components 
have zero mean. Denote R {resp. R^) the covariance matrix of vector X {resp. of its c-th component X^). 
A key assumption of our model is mutual decorrelation between all components, which implies that the 
covariance matrix of X decomposes as 

c 

R = ^R^ (3) 

C=l 

Consider now recovering linearly a particular component X'^ from the superposition ([2|. Denote Wc the 
m X m matrix which best predicts based on X in the sense that 

Wc = argrmnE|WX-X^p. (4) 

this problem is easily solved under our assumptions and one readily finds: 

Wc = R^R"^ (5) 

At this stage, we wish to deliver a simple but important message: eq. ^ means that component separation, 
understood as computing X^ = W^X, requires only knowing R^ and R. Of course, covariance matrix R may 
(but need not to) be simply estimated from the empirical covariance matrix R of X. Therefore, component 
separation is solved if we can uniquely solve the covariance separation problem^ in the sense of identifying 
the component terms R^ in decomposition ([3|. 

In order to achieve covariance separation, some additional information or assumptions are of course 
needed. Our method jointly exploits two possibilities: considering localized matrices (for more information) 
and considering structured matrices (for more constraints). 

2.2 Localized statistics, localized separation 

This section defines the 'localized statistics' on which our method is based. Consider the component sepa- 
ration problem based on p samples on m detectors. We shall not compress this data set into a single sample 
m X m covariance matrix but rather 'localize' the statistics and form a set of localized covariance matrices. 
By 'localization', we typically mean localization in time, space, frequency, wavelet space, depending on the 
problem at hand. The simplest example of localization is suggested by the separation of non-stationary time 
series as considered in [151 US] where the data are time-indexed and one would divide the observation interval 
into Q sub-intervals and compute an estimate of the covariance of X over each of these intervals. 

More generally, we assume that the data are available as p vectors X(l), . . . , X(p), each of size m x 1 
where the i-th entry of X(j) is either the j-th sample of the i-th input signal or its j-th coefficient in some 
basis {e.g. the Fourier basis). In our application to CMB analysis, we use the spherical harmonic basis (see 
section Is]). Basis choice is discussed below; it does have an impact on separation but it does not change the 
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nature of the problem in the sense that, if the component model ([2| holds in some basis, it also holds in any 
other basis and that if a component is reconstructed in a given basis, it becomes available in any basis. 

Localized statistics are formed by dividing the index set [1, . . . ,p] into Q subsets which are called 'domains' 
in the following (time domains would be time intervals, spectral domains would be frequency bands, etc. . . ). 
Denoting Vq the q-ih. domain, partitioning of the data set reads [1, . . . ,p] = iJ^^^Vq and localized statistics 
are computed (defined) as 

where Pq is the number of coefficients in domain Vq. 

The expected value of the sample covariance matrix for the gth domain is defined/denoted as 

Rg = E Rg 

and we shall use the same notation for each component so that, these being mutually uncorrelated by 
assumption, one has the decomposition 

= ER? = e{1 ^ x^{j)x^(j)^} = ^Y: Covx^o-) (7) 

We can then perform a localized separation in the sense that the localized Wiener filter for recovering 
component c in domain q is 

X-{j) = R^R-iX(j) for j G V, (8) 

A key point is that, by definition, this filter is specialized to operate on domain g, hence it is adapted to the 
local correlation conditions. 



2.3 Constrained components and model identification 

Implementing component separation by (|8|) requires knowning the local covariance matrices R^. These are 



usually unknown and must be estimated, based on R^, the sample estimate ([6| of their sum. It goes without 
saying that, without constraining the possible values of R^, it is not possible in general to uniquely resolve (an 
estimate of) this sum into its components. Constraining a given component c is achieved by parametrizing 



the matrix set = {Rg}^=i by a parameter vector 0^ i.e. a parametric model for the c-th component is a 
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function 0^ Vf{6^) = {Rg(^^)}g=i- Some examples of these parametric models are given in section 

A parametric model ^{0) follows by assuming component decorrelation ^ and taking the global 
parameter as the concatenation of the parameter vectors of each component: 

n{0) = {ii,{0)}% = o = {0\...,e^). 

c 

The parametric model is identified by fitting it to data, that is, by minimizing a measure of mismatch 
between IZ and 1Z{0) as: 

Q 

= argmin ^(l9) where ^{0) = ^WqK{Kq,Kq{0)) (9) 

where i^(-, •) is measure of mismatch between two positive mxm matrices and Wq are positive weights. Our 
most common choice is to use 

Wq = pq and ir(Ri,R2) = ^ [trace (Rj"^R2) - logdet(Rj"^R2) - m] (10) 

because this is the form which stems from the maximum likelihood principle. Specifically, if a coefficient 
X{i) belonging to domain q is modeled as X{i) ~ A/'(0,Rg) and as independent from X{i') for i ^ i' ^ then 
is the negative log-likelihood of the model |16|. 
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2.4 Summary 



In summary, our method is based on the fohowing steps and design choices: 

1. Choice of a basis to obtain coefficients X{i) (original space, Fourier or spherical-harmonic space, wavelet 
space,. . . ). 

2. Choice of domains {^q}^^i to localize their second-order statistics 7^, 

3. Choice of a model 1Z^{0^)^ for the contribution of each component c to the covariance matrix 
over each domain q. 

4. Minimization of the matching criterion ^ to obtain the estimate ^, hence estimates for the param- 
eters of each component. 

5. Estimation of the coefficients of each component by X^{j) = R^(^^)Rg(^)~^X(j), that is the version 
of (|8| based on estimated parameters. 

6. Reconstruction of the components from their coefficients X'^{j). 

At this stage, most of the statistical framework is in place but our method is not completely specified yet 
because of the flexibility in design choices in items [l] [2] and [3] above among other issues. Next section 
discusses some of them. 

3 Discussion 

3.1 Correlated or multi-dimensional components 

We wish to contrast the standard form X = AS + of the noisy linear component model of eq. ([T]) with the 
unconstrained model X = "^^X^ of eq. Obviously, the standard model is included in the unconstrained 
model since one may always write A^S* -\- N = Ylc=i with X^ = S-cSc for 1 < c < n and X^ = X"^+^ = 
where a.c denotes the c-th column of A and Sc denotes the cth entry of S. 

A component X^(^) which can be written as X^(^) = ac^dO is fully coherent in the sense that any two 
entries of vector X^(^) are 100% correlated. Of course, this is because all the randomness in X^(^) = ac^dO 
comes from a single scalar random variable SdS,)- A fully coherent component contributes exactly the same 
pattern on all sensors (or channels) up to a fixed proportionality coefficient. It has a rank-one covariance 
matrix: R^ = Cov(5'c(0)^caJ; for this reason, it could also be called a one-dimensional since it leaves in the 
ID space spanned by vector a.c or in the ID range space of R^. 

In our application, the CMB is a fully coherent component: from one frequency channel to another, 
only its overall intensity changes (ignoring the frequency-dependent resolution). This is in contrast with, 
for instance, the Galactic emission, which cannot be represented over the sky as a fully coherent component 
(see fig.|2|. 

Assume now that, in the model X = AS-\-N^ two sources, say i and j, are statistically dependent while all 
other pairs of sources in S are mutually independent. These two sources contribute a^S'^ -\-ajSj to X and we 
decide to lump them into a single component denoted A^ for some index c: X^ = a^S'^ + ^jSj. Then X can 
still be written as in eq. ([2| and all the components X^ are independent, again. In other words, we represent 
here the contributions of two correlated sources as the contribution of a single 2-dimensional component: 
we now have one less component but it is 2-dimensional. Clearly, this component is not fully coherent: it 
contributes a pattern on sensor m which is not proportional to the pattern contributed to another sensor m' . 
One can obviously define multidimensional components of any dimension, each one possible representing the 
contribution of several correlated sources. In section [Sj we use a 4-dimensional component to model Galactic 
emission. 

Note that, in eq. the noise term is included as one of the components. If the noise is uncorrelated 
from channel to channel, as is often the case, then the noise component is m-dimensional (the largest possible 
dimension) . 

More generally, our model does not require any component to be low dimensional. Rather, our model 
is a plain superposition of C components as in eq. ([2|. None of these components is required to have any 
special structure, one-dimensional or otherwise. 
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3.2 Library of components 



We call a collection of parametric models 0^ 1Z^(0^) a library of components. In practice, each member of 
the library must not only specify a parametrization 9^ Vf{9^) but also its gradient and related quantities 



(see sec. 4.2). 



Typical examples of component models are now listed. 

1. The 'classic' ICA component is fully coherent (one dimensional) X^{i) = 3.cSc{i)- Denoting a^^ the 
average variance of Sc{i) over the qth domain, the contribution of this component to Hq is the 
rank-one matrix 



q 

This component can be described by an (m + Q) x 1 vector 0"^ of parameters containing the m entries 
for the moment. 

A d-dimensional component can be modeled as 



of ac and the Q variance values a^^. Such a parametrization is redundant, but we leave this issue aside 



Rc _ A p At 
q — -^c^^ qc-^c 

where Ac is an m x d matrix and Pqc is an d x d positive matrix varying freely over all domains. This 
can be parametrized by a vector 6^ of mx d-\-Q x d{d-\- 1)/2 scalar parameters (the entries of Ac and 
of Pqc)- Again, this is redundant, but we ignore this issue for the time being. 
Noise component. A simple noise model is given by 

R^ = diag((7l^...,(7^) 

that is, uncorrelated noise from channel to channel, with the same level in all domains but not in all 
channels. This component is described by a vector 0"^ of only m parameters. A more general model 
is R^ = diag(cr^^, . . . ,cr^g) meaning that the noise changes from domain to domain; then parameter 
vector 6^ has size mQ x 1. 

As a final example, for modeling 'point sources' in spectral domain, one may use R^ = R^. Such a 
component contributes identically in all domains corresponding to a flat spectrum. If, for instance, we 
assume that this contribution RJ is known, then the parameter vector is void. If RJ is known up 
to a scale factor, then 6^ is just a scalar, etc. . .In the demonstration test discussed in [5) however, we 
consider instead the sum of Galactic and point source emission as one single 4-dimensional component 
- a choice which the flexibility of the present model allows us to do. 



3.3 About localization 

There are two strong, very different motivations for localizing the statistics. 

Localization for accuracy. The first motivation is separation accuracy. If the strength of the various 
components (including noise) varies significantly across the domains Pg, reconstruction is improved by the 
localized filter. Indeed, the best linear reconstruction of X^{j) based on X{j) would be, as seen above, 
obtained as X%j) = Cov{X%j)) Coy{X{j))-^X{j). This requires knowing both Cov(X^(j)) and Cov(X(j)). 
In practice, it seems difficult to obtain estimates for these matrices for all samples, i.e. for each value of j. 
However, if they do not vary too much for all values of j across a domain P^, then a good approximation 
to the best reconstruction is ([8| i.e. the reconstruction filter also is localized, taking advantage of the 'local 
SNR conditions' on domain Vq. 

Localization for diversity/ identifiability. Second, the diversity of the statistics of the components 
over several domains is precisely what may make this model blindly identifiable. For instance, in the basic 
ICA model (all components are one-dimensional, no noise), if X{i) are Fourier coefficients and Vq are spectral 
bands, it is known that spectral diversity (no two components have identical spectrum) is a sufficient condition 
for blind identifiability. 
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3.4 About blindness and the Fisher information matrix 



Is this a blind component separation method? It ah depends on the component model. If ah components 



are modeled as 'classic' ICA components (see Sec. 3.2), then the method is as blind as regular ICA. Our 



approach, however, leaves open the possibility of tuning the blindness level at will by specifying more or less 
stringent models 0'^ for some or all of the components. 

Of course, it may be difficult to predict if a given parametrization ensures the identifiability of the model: 
this is to be discussed on a case-by-case basis. However, over-parametrization can be tested numerically 
because the Fisher information matrix (FIM) is available in our framework. It has a natural block structure 
where the block related to a pair (c, c') of components is the matrix of size \0^\ x |: 



1 / B¥i!^iO^^ 

q \ 



W^^VWl (11) 



under the assumptions which make the mismatch measure ^(^) proportional to the log-likelihood (see 



sec. 2.3). The FIM is also used for computing (approximate) error bars. 



4 Implementation 

We discuss the practical issue of actually minimizing the mismatch ^(^) using an arbitrary library of com- 
ponents. 

Note that for a noise-free model containing only 'classic ICA' components (no other constraints than being 
one-dimensional), criterion <p{0) boils down to a joint diagonalization criterion which can be very efficiently 
minimized by a specialized algorithm [17]. For a model including only unconstrained multi-dimensional 
components and noise, it is possible to use the EM algorithm [6]. EM, however, is not convenient for general 
component models and, in addition, it appears too slow for our purposes. Therefore, we had to consider 
more efficient and more general optimization procedures. 



4.1 Optimization 

We found the Conjugate Gradient (CG) algorithm well suited for minimizing ^(6>). Its implementation 
requires computing the gradient d(j)/dO and (possibly an approximation of) the Hessian d'^(j)/dO'^ for pre- 
conditioning. Since (j){0) actually is a negative log-likelihood in disguise, its Hessian can classically be 



approximated by F(^), the Fisher information matrix (FIM) of eq. (11). These computations offer no 
particular difficulty in theory but we aim at an implementation it in the framework of a library of components 
i.e. computations should be organized in such a way that each component model lZc{0'^) works as a 'plug-in'. 

4.2 Computing derivatives 

Computing the gradient. The partial derivative of (j) with respect to 0^ takes the form 
where matrix Gq{0) is defined as 



G,(^) = \w,-R-\e) (r,(^) - R,) R-i(0). (13) 

Hence the computation of dcjy/dO at a given point = (6>^,...,^^) can be organized as follows. A first 
loop through all components computes Tl{6) by adding up the contribution Vf{0^) of each component. 
Then, a second loop over all Q domains computes matrices {Gq{6)}^^^ which are stored in a common work 
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space. Finally, a third loop over all components concatenates all partial gradients d(j)/dO'^^ each component 



implementing the computation of the right hand side of (12) in the best possible way, using the available 
matrices 

Computing an (approximate) Hessian. The Fisher information matrix can be partitioned component- 



wise and each block be computed according to eq. (11). Therefore its computation can be organized with 



a double nested loop over c and c' as soon as the code implementing a given component is able to return 
the matrix set { — — }g=i- ^ straightforward implementation of this idea may be impractical, though, 
because this is a set of |6>^| xQ matrices, possibly very large. This problem can be alleviated in the frequent 
case where components have 'local' variables. 

Local variables. Consider the case when 0^ can be partitioned into Q + 1 blocks: 0^ = (^g, ^J, . . . , ^g) 
where, for g > 0, the sub- vector 0^ affects only while 6q collects all the remaining parameters i.e. those 
which affect the covariance matrix over two or more domains. We then say that this component model has 
'local variables'. This is a fairly frequent situation which occurs for instance when the power of a component 
can be freely adjusted in each domain (the simplest example is the 'classic' ICA component: = acajja^^, 
for which Oq = a and 0^ = for q = 1, . . . , Q. The global parameter vector inherits this structure by 
being partitioned accordingly into a 'global part' = (^o^ • • • ^ ^o^) Q local parts Oq = (^J, . . . , ^^). 

A first and major benefit of such a local/global partitioning is that it introduces many zero blocks in the 
FIM since then [F(6>)]gg/ = for 1 < g 7^ < Q, allowing the computation of the pre-conditioned gradient 
Y(0)~^d(j)/ dO to be organized much more efficiently. 

Another benefit of the local/global partitioning is that it makes it easy to implement a 'local optimization': 
during global optimization over 6>, it is possible at any time to loop through all Q domains and to solve in 
each domain, possibly exactly, the sub-problem min^i^ i^(Rq, Rg(6>0 7 ^q) ) which is, of course, of much smaller 
size than the original problem in most cases. 

4.3 Indeterminations and penalization. 



Examples of section 3.2 show that 'natural' component par ametrizat ions often are redundant. From a 
statistical point of view, this is not important: we seek ultimately to identify IZ'^ = {R^}^^;^ ^ member 
of a family described by a mapping 0'^ Vf{9^) but this mapping does not need to be one-to-one. The 
simplest example again is for R^ = acaja^^ which is invariant if one changes ac to aac and cFqc to a~^cFqc. 
Therefore, there is a ID set of pairs (accrue) achieving the global optimum but they all correspond to a 
unique value of R^ itself, which is thus perfectly well defined at the optimum and is the unique quantity 
needed to reconstruct the c-th component in domain q. 

The only serious concern about over-parametrization is from the optimization point of view. Redundancy 
makes the (j){0) criterion flat in the redundant directions and it makes the FIM a singular matrix. Finding non 
redundant re-parametrizations is a possibility, but it is often simpler to add a penalty function to (j){0) for any 
redundantly parametrized component. For instance, the scale indetermination of the classic ICA component 
R^ = acajcr^^ when parametrized Oq = ac and 6^ = a^^ {q > 1) is fixed by adding (j)^{6^) = ^(||ac|p) to 0(6>), 
where g{u) is any reasonable function which has a single minimum at, say, u = 1. Of course, the addition 
such a term should be reflected in the gradient and the Hessian of the matching criterion. 



5 Results on simulated Planck data 

This section describes an application of our framework to a CMB data set. Although we go into some detail, 
several issues cannot be discussed due to limited page space. The main objective here is illustrative. 

5.1 Data 

In preparation for data acquisition by the Planck mission, the ability to perform component separation is 
evaluated by resorting to a realistic set of simulated data which is developed within the Planck collaboration 
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as the 'Planck sky model' (PSM). This suite of programs is used to generate random realizations of the sky 
at the Planck frequency range compatible with the present knowledge of all identified component emissions 
and aims at capturing many of the intricacies (both from the sky and from the instrument) expected from 
real data. 

We use sky maps simulated at the 9 frequencies of Planck channels. The instrument point spread function 
is modelled as a Gaussian beam whose width decreases with increasing frequency: the beams' FWHM are 
33, 24, 14, 10, 7.1, 5, 5, 5, 5 in arcminutes. The beam effect is visible in figure |2] where the angular size of 
the galaxy clusters (via the SZ effect), point sources and CMB anisotropics decreases from top to bottom 
refiecting the increasing resolution. 

Planck does not take a 'snapshot' of the sky. Rather, sky maps are (painfully) computed from sky scans. 
Due to various technological constraints, the scanning strategy does not guarantee that all pixels are seen 
(or 'hit') equally often. As an important consequence, the variance of the noise in each pixel depends on 
its hit-count. In our simulated data set, the noise is modelled as Gaussian, independent from channel to 
channel, from pixel to pixel and with a variance inversely proportional to the hit-count. See fig. [3] for a sky 
map of hit counts corresponding to a one year survey. The noise variance in one pixel for one hit depend on 
the frequency channel and are given for the 9 channels by 1027, 1434, 2383, 1245, 753.6, 609.1, 424.5, 154.9, 
71.8 /jK^ for the maps used in this simulation. 

The full-sky maps used as inputs to our experiments are shown on figure [l] Figure |2] zooms in and shows 
the various physical components used as ingredients: CMB, Galactic emissions, galaxy clusters (via the SZ 
effect), point source emissions (due to infrared- and radio-galaxies). Galactic emission is shown as a single 
component but is actually made up of three physical components due to free-free, synchrotron and dust 
emissions. 



5.2 Building localized statistics 

The data are processed in Fourier space. On the sphere, a Fourier basis for band limited functions is a 
doubly-indexed set {¥£^{0 \ < £ < ^max, — ^ < m < ^ } of orthonormal functions with the property that 
AYim = —£{£ + l)y£m where A is the spherical Laplacian. Index £ is called the angular frequency. The 
coordinates a^m of a spherical function on such a basis are called its 'spherical harmonic coefficients'. The 
quantity q = Xlm=-^ l^^mP is the angular spectrum of the function. If this function is a realization 
of a stationary process, then q = Eq is the angular spectrum of the process. The CMB is thought to be 
a realization of Gaussian stationary process; accurate estimation of its angular spectrum is one the main 
scientific products expected from CMB observations. 

We compute localized statistics in both space and frequency: first the sphere is decomposed in three 
zones: a small zone where Galactic emission is so strong that no processing is attempted there, a second 
zone farther away from the Galactic center and finally a zone of weak Galactic emission which includes both 
ecliptic poles. The corresponding masks are shown on fig [3j We also localize in Fourier space by defining 
Q = 120 domains which correspond to 120 bins of angular frequency. The width 6^ of these spectral bins 
increases with the angular frequency ^, as follows: = 2 for < £ < 29, 5^ = 5 for 30 < £ < 149, = 10 
for 150 < ^ < 419, 6^ = 20 for 420 < ^ < 1199, = 50 for 1200 <£< 2000. 

For each of the two spatial domains, we build spectral statistics as follows: the zone of interest is isolated 
by using (an apodized version of) the masks shown on figjsj the spherical harmonic coefficients for each of 
the 9 frequency channels are computed and collected in a 9 x 1 vector Xi^ for each value of the pair (^, m) 
up to -^max = 2000; spectral matrices are then computed for each £ < 2000 as = ^m-^^rn^lm- 
this stage, since symmetric beams are assumed, their effect can be corrected as ^ W^'KiWi where Wi is 
a diagonal matrix. We also correct for incomplete sky coverage by dividing each R^ by a factor /sky which is 

the fraction of the sky left after masking. Finally, the spectral matrices are binned into Hq = ^^^-^r-^^ 

where hq{£) is the top hat function for the qth bin. These matrices collect an effective number pq of Fourier 
modes given by Pq = /sky hq{£){2£ + 1). 
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5.3 Component analysis 

We now fit a parametric model to tiie localized covariance matrices built at previous section. We fit inde- 
pendently the two zones of interest and then stitch the resulting maps together. In this illustrative example, 
we fit a model made of C = 4 components 

Kq = Kl"^^ + R^/ + Rf ^ + R^°^'^ 1 < g < 120 

with the following parametrization/constraints: 

• The CMB component has a known emission law acmb but unknown angular spectrum c{q) so R^"^^ = 

acmbaJmb^cmb(<?) and r"'^ {Ccmb (<?)}• 

• The SZ component, as the CMB, is fully coherent and has a known emission law. Hence, we take 
R^^ = aszat^Csz(^) and 0^^ = {csz 

• We try to capture all Galactic emission (which is far from being coherent) together with emission from 
other galaxies - the 'point source^ component^ in a 4-dimensional component without other constraints, 
that is, we set up the 'Galactic' component model as R|^^ = AgaiPgAg^^^ with Agai an unconstrained 
9x4 matrix and Pg a 4 x 4 an unconstrained positive matrix so 0^^^ = {Agai, Pg}- 

• For the noise contribution, we rely almost entirely on the instrument characterization: only m global 
parameters are adjustable as described in the third example of component in section [3]2j that is we set 
up: R^°^®® = diag(ain2q, Q^m^mg) where riiq is known so = {a^}. The fixed spectra Uiq are 
computed from the hit-counts maps and other figures given above. 

Hence the global parameter vector is 6 = [ccmb(^), Csz(^), Agai, Pg, a^]. 

Note that the angular power spectrum of the CMB is obtained as a by-product of the global fitting 
procedure. 

5.4 Some results 

We show and comment some results from our best-fit model. 

Goodness of fit. The top panel of figure [i] displays the values of WqK{Ilq^'Rq{0)) versus the spectral 
domain index q (actually: versus the average angular frequency £ for this domain). The overall mismatch 
measure ^(6>) is just the sum of these quantities over all domains: see eq. ([9|. It is possible to predict the 
average mismatch value (in the asymptotic regime of many Fourier modes per domain) when the model 
holds. This provides at least a reference value in terms of goodness of fit. This value (and twice this 
value) are displayed as a horizontal line on fig. [4j The fit appears good for i > 500 but not so good for 
200 < i < 500. This reflects the difficulty of modelling the complex galactic structure (since this component 
is important at this range scale) and also possibly the fact that the assumptions required for predicting the 
average mismatch are not met in this domain. The bottom panel of the figure shows how the mismatch 
increases when the contribution of any one of the components (except noise) is removed from the model. In 
this case, the mismatch explodes, showing that all fitted components are significant here. 

Power decomposition at sensor level. The decomposition ^ allows to find the contribution of each 
component on each detector by domain, i.e. as a function of angular frequency in the present case. Figure [5] 
displays for all detectors {j = 1, ... 9) the values of [R^(^)]jj versus the center frequency of the qth spectral 
domain. In other words it shows estimated band-averaged angular spectra for all fitted components. It also 
shows the sum of these contributions over all components as well as the values of [Rg]jj. Actually the plot 
of the former quantity is almost entierly masked by the plot of the latter, evidencing a near perfect fit of the 
global angular spectrum. 

Figure [i] shows the cross-spectra between channels, that is the values of [Ilq{6)]ij for 4 selected pairs of 
channels. 

CMB angular spectra. Left panel of figure [t] shows the estimated angular spectrum cf^^ with error 
bars computed from the Fisher information matrix. The angular spectrum of the CMB process is known 
in this simulated data set and displayed as a solid line. An excellent recovery is observed up to angular 
frequency £ = 2000, together with plausible error bars. 
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CMB map. The CMB map recovered through the locahzed Wiener filter ([8| is shown in Figure [sj It 
has no visible contamination but reconstruction is not over the whole sky: the small Galactic region (shown 
on fig. |3| is not reconstructed because the Galactic emission is so strong that avoiding any contamination is 
very difficult. The bottom map of fig. |8]is the residual i.e. the error between true and reconstructed CMB 
maps. Some Galactic residuals are now visible indicating that the localization strategy should be improved 
(the Galactic emission is known to vary a lot across the sky, and especially near the Galactic plane). 

The angular spectrum of the residual map (CMB error map) is shown on the right panel of fig. [t] The 
linear (on this log scale) trend indicates that the error is noise-dominated (as also seen from the residual 
map). 

Galactic and point source emission reconstruction. The most complex component is due to the 
radiation from the interstellar medium in our Galaxy (Galactic emission) as well as other galaxies (which 
contribute most of the emission of extragalactic 'point sources'). The present work focuses on CMB recon- 
struction, and no attempt has been made to separate the Galactic emissions on the basis of the astrophysical 
radiation emission process (synchrotron emission, 'free-free' emission, or grey body emission from dust par- 
ticles), nor on the basis of their origin (within our own Galaxy, or in other galaxies). For this reason, the 
sum of these emissions outside our Galactic + point source mask has been tentatively represented by a 
4-dimensional 'catch all' model as explained above. The result of the separation for this component is shown 
in fig. [9) In spite of small discrepancies in the reconstruction on large angular scales, the result is quite 
satisfactory. Angular power spectra for this model, displayed in fig. |5] clearly correspond to the sum of the 
steep Galactic power spectrum, and a fiat plateau at large angular frequency ^, due to the contribution of 
emissions from the large number of extragalactic sources not masked by our point source mask. 

6 Conclusion 

The component separation technique discussed in this paper offers great modeling ffexibility from the re- 
alization that separation can start with covariance matrix separation — i.e. the identification of individual 
component terms in the domain- wise decomposition ^ — followed by data separation according to ([5|. Too 
much fiexibility may also introduce difficulties, though. Indeed, whether or not minimizing the covariance 
matching criterion (j){6) leads to uniquely identified components depends on the particular choice of com- 
ponent models. In our approach, however, the amount of constraints imposed on any component is fully 
tunable. By using more or less constrained components, the method ranges from totally blind to semi-blind, 
to non-blind. 

Some other good points are the following. Speed: the method is potentially fast because large data 
sets are compressed into a much smaller set of covariance matrices. Accuracy: the method is potentially 
accurate because it can model complex components and then recover separated data via local Wiener filters 
which are naturally adapted to the local SNR conditions. Noise: the method can take noise into account 
without increased complexity since noise is not processed differently from any other component. Prior: the 
implementation also allows for easy inclusion of prior information about a component c if it can be cast in the 
form of a prior probability distribution Pc{0^) in which case one only need to subtracting log Pc{0^) from (j){0) 
and the related changes can be delegated to the component code. Varying resolution: in our application, 
and possibly others, the input channels are acquired by sensors with channel-dependent resolution. Accurate 
component separation requires to take this effect into account. This can be achieved with relative simplicity 
if the data coefficients entering in Rg are Fourier (or spherical harmonic) coefficients. Built-in goodness 
of fit via the mismatch measure ([9|. 

This paper combines several ideas already known in the ICA literature: lumping together correlated com- 
ponents into a single multidimensional component is in [5]; minimization of a covariance-matching contrast 
(j){0) derived from the log-likelihood of a simple Gaussian model is found for instance in [17]; the extension to 
noisy models is already explained in [8] . The current paper goes one step further by showing how arbitrarily 
structured components can be separated and how the related complexity can be managed at the software 
level by a library of components. 

About Gaussianity. The specific choice of the matching criterion ([9| stems from a Gaussian model for 
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the signal coefficients and the linear Wiener filter ([5| is optimal only for Gaussian signals. Hence, there is 
no doubt that an improved statistical efficiency could be gained by resorting to non Gaussian models and 
to non-linear filtering for strongly non Gaussian data. However, this is more easily said than done, non 
Gaussian modeling/processing being often more difficult and costly to implement. By sticking to simple 
Gaussian assumptions, we can afford to model a non trivial correlation structure (through domains and 
through sensors) so it is not clear yet what the good trade-off is. It may depend very much on the scientific 
objectives (recovery of CMB versus recovery of other components, ability to predict estimation errors, . . . ) 
and technical constraints {e.g. fast codes are important for assessment via Monte-Carlo runs). Intensive 
work is in progress within the Planck collaboration to assess the performance of various approaches for CMB 
analysis. 
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Figure 1: Simulated full-sky observations at the Planck frequencies: 30, 44, 70, 100, 143, 217, 353, 545, 847 
GHz. Color scale: ±0.5 mKRJ 
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Figure 2: A zoom on a patch of size 17x17 degrees, close to the Galactic plane for the channels at 30, 70, 
217 and 857 GHz. The figure shows (in arbitrary color scales) the contributions of each physical component. 
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Figure 3: Left: Hit-counts map for the 30 GHz channel. In each pixel, noise variance is inversely proportional 
to it. The number of hits in this map goes from around 40 to 2320. Right: the zones (masks) used in the 
analysis: polar zone (red), intermediate zone (green). Galactic zone and point source masking (blue) which 
is ignore in our analysis. 
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Figure 4: Top: Spectral mismatch versus angular frequency for the best-fit model. Bottom: mismatch when 
any one of the components (but noise) is removed. See text for more details. 
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Figure 5: Angular spectra on channels at 30, 70, 217, 857 GHz estimated with our best in the polar zone 
(mKRJ^). The line labelled stats shows the values of [Rg]jj; the line labelled model the values of [Rg(^)] 
the latter is hidden by the former almost everywhere. 
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Figure 6: Cross-spectra for the channel pairs 30-70 GHz, 70-143 GHz, 143-353 GHz and 353-857 GHz and 
our fit with cmb+sz+(Galaxy+PS)+noise in polar zone (mKRJ^). Same conventions as in figure [s] 
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Figure 7: Left: Estimated angular spectrum of the CMB (in polar zone) in mKCMB^ with ±2cr error bars 
given by the Fisher information matrix. We (conventionally) plot ^(^ + l)Q/27r rather than the plain angular 
spectrum q. Right: Angular spectrum of the reconstructed CMB and of the residual in mKCMB^. The 
red and the black curves are the same reference input CMB spectrum which has been used to generate the 
CMB component. 
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Figure 8: Top: input CMB map; middle: estimated map outside of a small Galactic mask (color scale 
±0.5 mK CMB). Bottom: difference map (color scale: ±0.05 mK CMB). 
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Figure 9: The Galactic component seen by the 30, 70, 217 and 857 GHz channels. Left column: dust + free- 
free + synchrotron from the input maps in MJ/sr. Scale between and 0.01 for the 30 and 70 GHz channels, 
between and 0.1 for the 217 GHz channel and between and 7 for the 857 GHz channel; center column: the 
4-dimensional component found by SMICA with the same scale range; right column: the difference, rescaled 
by a factor of 10 to emphasize the errors. 
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